NAS Linux OS recommendation that supports 100 Gbps Infiniband

What would people recommend as a NAS OS that supports 100 Gbps 4x EDR Infiniband (Mellanox ConnectX-4 dual port VPI 4x EDR 100 Gbps Infiniband MCX456A-ECAT)?

I was looking at Rockstor OS which uses btrfs, but it says that raid5/6 is experimental and due to the nature of btrfs, it’s not reliable.

I had also looked into FreeNAS as well, and I think that supports the Mellanox 100 Gbps Infiniband network adapter, but I was also watching the latest video that Wendall posted in terms of the comparison between unRAID and FreeNAS and he had stated that there were still some things that were better in unRAID as compared with FreeNAS.

I have no idea if unRAID supports the 100 Gbps Infiniband adapters or not. I was under the impression, when I looked into this a few months ago, that unRAID still doesn’t support these adapters.

I’m looking at this in order to replace my current Qnap TX-832 NAS units because nothing that Qnap makes supports 100 Gbps Infiniband either.

Hence, why I’m here.

NAS OS recommendations are greatly appreciated.

Thank you.

@freqlabs might know about FreeNAS support.

Yep.
VPI means you should be able to set the port to Ethernet mode, which you’ll want to do if you want to be able to configure through the UI.

But this would mean that FreeNAS only supports 100 GbE, not Infiniband, correct?

(Unfortunately, I don’t have the 100 GbE switch, I have the 100 Gbps Infiniband switch and you can’t change the port from IB to ETH on the switch itself, which is an entirely separate issue that I have a gripe with as it pertains to Mellanox.)

Why not install freebsd and install the freenas utilities with your infiniband setup already done?

That’s not really an option.

To clear up a few things.
Freenas and i think FreeBSD in general come with the mellanox drivers for infiniband and those cards in General.
mlnx4_core, _eth, _ib.
I have no clue about unraid but the drivers are free and open source made for Linux and ported to BSD.
I myself run a point to point 40GbE with cx354A FCBT.
Also VPI cards.
The drivers are in the rdma-core package on the AUR if I remember correctly.
And i’m running freenas.

The major difference between IB and ETH is that you need software support for IB on all sides, which is very slim.
Additionaly, you need a Subnetmanager running on the network to have it working.
IB has ipoib for IP emulation over ib which works but doesn’t perfm as good as the ETH mode itself.

ETH mode just works tm.

The much harder thing to get is Performance because you will be limited in CPU power anyway.
Additionally, some protocols suck in that regard.

RDMA is the light at the end of the tunnel.
Everything is in theory supporting RDMA.
But not currently with freenas as far as I’m aware.

SMB direct on the samba side is still nowehere to be seen I think.
And even if it was there, all the client applications need to support it too. Which will be even slower.

NFS and iscsi do work well with RDMA, but freenas lacks the kernel modules for that.

You might need a custom kernel on that side.

1 Like

To elaborate a bit on it.
I don’t think that drivers for the card will be your problem.

The main problem will be what you want to achieve with it performance wise, and the software stack that you have or want to use.

Freenas at least will work out of the box,
It shouldn’t be a problem getting the drivers on Linux and on windows, you simply get WinOF from the mellanox page itself. Done.

This is correct, the FreeNAS kernel config does include OFED for Infiniband drivers. I don’t know how well supported IB is by the UI, IPoIB might be fine. I’ve never used it myself, so I can’t say for sure.

@RageBone is mostly correct.

In my testing, IB is actually between 1-3% faster than ETH, but he is correct in saying that with IB, you do need to have a subnet manager running (which I already have a subnet manager/OpenSM running, so that’s not really going to be an issue for me).

He is also correct in stating that with ETH mode, it just “works” in the sense that you DON’T need a subnet manager running, but if you want to connect more than two systems together, you’ll need a 100 GbE switch which Mellanox pretty much daylight-robs-you for. (16 port 100 GbE switch can range from $7325-9975 on ColfaxDirect.com vs. $11095 for a 36-port 100 Gbps 4x EDR IB switch. In other words, total switching capacity on the 100 GbE switch is less than the IB switch, and the per-port price is about 50% more with 100 GbE vs. 100 Gbps 4x EDR IB, which is stupid because Mellanox HAS their VPI technology, which means that they could put it on their switches and then you can configure the ports on said switch like you can on their cards, but they won’t do that, because they can rob you blind like this instead. But I digress…that’s an entirely separate issue and discussion altogether.)

The other caveat to @RageBone’s reply is that RDMA over IB is more “straightforward” (if you can call it that) than RoCE, even if you only have one type/stream of traffic going through the 100 GbE interface/protocol.

To deploy RDMA over IB, you really just need only the rdma-core packages and then you can set up NFSoRDMA with the “inbox” drivers, but NOT the Mellanox OFED Linux drivers. (Mellanox took that out. ALSO go friggin’ figures.)

So as a standard, standalone server, my system does all of that fine. But this is where it brought me to the question of “okay, if I want to merge it with the ‘pretty’ NAS or NAS-like GUI (because some stuff is easier to set up with a GUI than it is with clicking, and in reality, most GUIs, I think, for *nix should be just a script generator anyways, but I’m not a programmer, so maybe there’s more to it than that), so that I would be able to better and more efficiently manage my system whilst still maintaining all of the high speed connectivity options.”

That’s kind of where my thought processes went after getting my systems up and running.

Performance, on the whole, is actually limited to the fact that I am using large capacity (6-10+ TB) mechanically rotating drives because U.2 PCIe 3.0 x4 NVMe SSDs is still a tad too expensive for me to deploy at my scale.

And the primary reason why I went with 100 Gbps IB is because my cluster needed it as the system interconnect, but I also don’t want to deploy the additional layer of 10 GbE for the data management and traffic throughput. I figured that since I already deployed 100 Gbps IB, I might as well try and take advantage of it.

And yes, I’ve thought about going SMB Direct route as well, but then I’ll be using Windows Server instead of *nix, and I’ve also thought about deploying iSER as well, as that seems to be the RDMA storage subsystem that’s supported right now, at least officially, by Mellanox on *nix systems since they took out NFSoRDMA capabilities from their own OFED drivers, for who knows why.

1 Like

So, SMB Direct on anything other then Windows seems to be currently not an Option.

But is that actually an issue?
With the HDDs, as long as you don’t have a shitty CPU like i do, the harddrives should be the limit.

On the other side, i don’t think freenas comes with any RDMA capability. At least, i haven’t Seen an kernel-modules listed in kldstat that would implay that it has.

So NFSoRDMA and iSer or even SRP should there for not be possible on freenas.

@RageBone

So SMB Direct on anything other than Windows seems to be currently not an Option.

But is that actually an issue?

Yes, if I want my other Linux systems to be able to access the shared folders over SMB Direct.

Linux is “weird” in the sense that it doesn’t ALWAYS necessarily work, but when it does, it can be anywhere between 10-40% faster doing exactly the same task with the same program/application over Windows.

So…to that end, yes, it does and will matter.

With the HDDs, as long as you don’t have a shitty CPU like i do, the harddrives should be the limit.

Also depends.

I currently have four HGST 6TB 7200 SATA 6 Gbps drives in RAID0 on a Broadcom/Avago/LSI MegaRAID 9341-8i 12 Gbps SAS RAID HBA card and when I run “iostat -dxm /dev/sdb1 3”, it shows 100% CPU utilization, but only at around 45-60 MB/s throughput (moving .par2 files from the HDD RAID0 array to a four drive SSD RAID0 consisting of four Samsung 860 EVO 1 TB SATA 6 Gbps SSDs on the same RAID HBA).

What’s also interesting is that in my benchmarking and also after I’ve calculated the “double parity” on said .par2 files and then moving the data back (which I’ve given it the extension of .par2.par2), I can move it back to the same four HDDs in RAID0 at 800 MB/s (not a typo). So…to that end, yes, a CPU can matter more than the drives themselves.

I’m not really sure WHY it does this, and I also can’t really tell if it is because the XFS formatted, four HDD RAID0 is highly fragmented in Linux either. But I would also think that if that were the case, it shouldn’t be able to write the data at said 800 MB/s whilst it was only able to originally move the file off the HDD RAID0 array and onto the SSD RAID0 array at between 45-60 MB/s. Not really sure what’s going on there, but again, I digress. Another discussion for another time.

So NFSoRDMA and iSer or even SRP should there for not be possible on freenas.

This is what I was afraid of.

Pity.

Seems like that you can either get “pretty” or you can get “performance” but not both at the same time.

There is always the “do a lot of hard work” option :wink:

You could do your own kernel for freenas with those modules.

If it is just building them in, that is fairly straight forwoard, as far as i know from the IB days where we had to do that.

Did i read or rather understand that correct that your parity calculation is what is hindering you right now?
RDMA wouldn’t really help you there i think.

On the options side, have you heared about the Cockpit thing?

That can maybe be “pretty” and enable you to ditch freenas if you like.

No, sorry.

It’s not the process of calculating the parity that’s holding me back, it’s moving around the .par2 files that’s holding me back (for some strange reason) – maybe due to the nature of the .par2 files (more random, and therefore; it can’t compress it as much when you are trying to move/manipulate it.)

The parity calculation process actually isn’t that bad. Towards the end, I was able to calculate the parity and generate the .par2 files faster than I was able to move said .par2 files around.

No, I haven’t heard about the Cockpit thing, so I’ll have to go check that out.

re: making my own kernel
I’m not a programmer, so I’m not very good at doing stuff like that.

I would prefer more of a “turn key”/quick-to-deploy solution so that I can focus my energies on my actual engineering work rather than tending to IT matters that is necessary and required in order to support my engineering work.

Thank you.

1 Like

well, i guess turnkey is the problem, i don’t think there is anything out there really capable of doing that.

So, on the OS question again, i’d say it depends on the used Share type and the clients.

If it is Windows clients and therefore SMB, there is no way around Windows on the server because that is the only thing that can give you RDMA with SMB currently.

If you can go RDMA on NFS or iscsi, Linux and Freebsd should be good with RDMA for those. Sadly, windows clients will suck with both.

In theory, it isn’t that hard to get the RDMA capability into freenas, but that would be uncharted territory i guess.

Time for me to have a look at that with my 40GbE gear.

Yeah…Qnap supports upto 40 GbE out-of-the-box with their versions of the Mellanox adapters, but not 100 GbE, and definitely not Infiniband.

And what’s worse is that Mellanox, being a free market, capitalist company, charges 50% more per port for 100 GbE vs 4x EDR Infiniband despite the fact that they own the rights and the technology to be able to specify, within their firmware, whether a port is IB or ETH via their VPI technology.

But instead, they choose to charge their customers 50% more per port for the “privilege” of being able to run 100 Gbps over ethernet over Infiniband even though their VPI cards can do either already.

sigh…

i don’t get where you are going with that.

i mean, are the IB and the Eth cards physically different?
If not, you can probably flash them over from ib to eth.

If they are actually labled as VPI, you can choose which protocol to use at runtime easily.

At least with CX3, and i assume CX4, you have great support driver whise over basically all OSes.

The problem is not the driver, the problem is the used software stack to saturate that performance in combination with the hardware.
The CPU is the limit at some point.
So things like RDMA and NVoF are there to basically offset the core of the issue.

So again, the OS choice for your NAS or server should not depend on which OS gives you support for the NICs, all do, it should which one gives you the best performance / Software stack for your situation.

iscis is great, iSER is its RDMA version, windows does not have iSER capability. Windows only has SMB Direct aka SMB with RDMA,
Samba is not there jet as far as i know so any other base other then windows is sadly out of the game in that case.
iscsi is not multiuser capable i think, i mean, one device for one user, not multiple users one device.

So NFS with RDMA Exists, has great support on the Linus side, can’t talk about the freebsd side, but sadly, Freenas does not come with RDMA capabillity to repeat that.

SO, what is the specific situation you are asking this for again ?

EDIT:
If you don’t need RDMA, Freenas is fine.

I don’t know if you can actually flash an EN card to be not-an-EN card. (Because if that were true, then in theory, you might be able to pick up the EN cards at a lower cost than the VPI cards, and then just flash them back to being VPI cards. I think that there might be electrical differences, but nothing that I can tell immediately, with the naked eye and/or not without specifcally studying what are the differences between the two).

Yes, you can choose the PORT to be either IB or ETH, but you CAN’T choose it on the iB switch.

I have a minimum of 6 nodes now in my home network that are all running 4x EDR IB (100 Gbps), and with 6 systems, you will get 15 different pair-wise combinations, which there aren’t enough ports (nor are there enough PCIe slots on any single, given system) be able to just connect them using only pair-wise connections.

But as it is mentioned, not all OSes support 100 Gbps IB, especially anything that has to do with RDMA. (e.g. NFSoRDMA)

As others have mentioned already, each OS option has their pros and cons and will only support a subset of the things that I am looking for from the NAS OS WITH 100 Gbps IB support.

I sort of like the “pretty” GUI interface that I have now with my Qnap NASes for managing the system (it’s definitely a lot easier than running a full blown Linux distro like SLES or RHEL/CentOS), but Qnap doesn’t have 100 GbE, let alone 100 Gbps IB support in their future product plans any time soon.

The backbone of my home network is 100 Gbps 4x EDR IB.

As I mentioned, 6 of my systems run off of that network (and they all run Linux. 5 are running CentOS and 1 is running CAE Linux 2018 (which is built on top of Xubuntu 16.04 LTS).

So what I am looking for is a NAS OS that has a “pretty GUI” like Qnap, but also supports the 4x EDR IB because that’s the backbone of my home network.

100 GbE is out because I don’t have a 100 GbE switch (and I refuse to pay 50% more per port because Mellanox is being capitalist b----es about it) and I already have my 36-port 4x EDR 100 Gbps IB switch/infrastructure in place. (Hence how and why I have 6 nodes that are running on my 4x EDR IB 100 Gbps network backbone.)

RDMA is required in order to maximize the transfer speeds.

If FreeNAS support 100 Gbps 4x EDR IB with NFSoRDMA, then it would have been probably the preferred option that would check all of these checkboxes. But as it is stated, it doesn’t, and so, it appears there are precisely zero OSes that matches all of the criteria that I am looking for, for this purpose, from a “NAS OS”.

(I’ve ran full blown OS distros/installations before on the server, and I have found that administering it through the GUIs is a LOT faster as it automates a lot of the underlying processes that normally I would have to copy-and-paste commands for, which is where I see the GUIs as being a tool to help me do that thing faster/more efficiently. This is why I want to and would prefer to stay with a GUI for system administration.)

I don’t really know nor understand why FreeNAS doesn’t support RDMA. That seems strange to me given it’s *BSD roots.

Hmmm…

A bit of digging later, NFSoRDMA seems to need NFS V4.1 which appears to not jet be in freenas stable.

But, it appears as if they are trying to get something in 11.3 which is due in the near future.

Since your NAS isn’t supporting IB at all, why not switches that over to freenas or unraid, which both should support IB in my opinion, and since i think you don’t currently have rdma, not having it now wouldn’t mean a regression?

I have no experience with unraid or freenas on cx4 but i think cx4 uses the same modules as cx3 and should therefore work.

If you like, you can send me two cards and i’ll test for you ; D