FreeNAS All SSDs?

I was going to reply to that thread, but I distinctly remember being too drunk to make a cohesive argument.

2 Likes

Fedora 27

05:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
05:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)

They are in target mode so I can use targetd to export luns to my promox. Im pretty sure I created a thread about DIY SAN a while ago.

1 Like

FWIW, I had 4,6 and 8 disk arrays of 850 pros on an LSI controller. Even with Raid0 for giggles it was underwhelming compared to a 960 pro NVMe given the complexity of it. Particularly of the OP’s use-case is a relatively small fast array over 10GbE, you are capped at ~1GB/s which pretty much any NVMe drive or pair of them can do.

Still monkeying around with it - trying to bring up a 40GbE 4xNVME ZFS cache to a 10 HDD array, but I’ve now had 2 of 10 drives require return for bad blocks (I’m just lucky I guess)

My fall-back array is an xpenology with 10GbE and 4x10T HDD and 4x512 850pro SSD cache produces 600-700MB/s typical sustained sequential throughput after the cache warmed up and frankly at the application level, its performance is indistinguishable from either a single or dual (raid0) 960Pro NVMe locally.

Doing database walks, so maybe a photo/video app is worse, but I wonder… If you limit yourself to 10GbE, then that dictates that not much in the way of disk hardware is required.

1 Like

May I ask which LSI controller you are using to make use of more than 4 SSD’s? I don’t get to play with enterprise gear, so just wondering

and is this the controller card, seperate to the actual HBA card plugged into the disc caddys via breakout cables- (sff8087 to sata?)

What share are you using and is it utilizing rdma?

This is all Linux BTW - linux server, linux client, NFS mounts…

Megaraid 9361 8i - Yes, breakout cable to 4 SATA per port for a total of 8. I also have a 4-port Highpoint - RR840 - card that supports 16 SATA without an expansion card (via break-out to 4 SATA). That’s the one currently running the 10 HDD array.

re: RDMA - I’m still learning about 40GbE. I’ve done some throughput testing with IB and RDMA, but I haven’t gotten NFS + RDMA to work yet (haven’t had much time with it frankly).

Most of my 40G cards are VPI (dual mode) and dual port. IB mode (the switch) has been cantankerous - ports dropping to 2.5Gbps and lousy multi-thread performance. IPoIB without RDMA produces 12Gbps and very lumpy results in iperf3. RDMA benchmarks produce line-rate, but… Still a work in progress.

I’m going to setup a star point-to-point net for 4 compute nodes using multiple 2-port cards on the server. On the clients, I’m using the first port for Ethernet mode connected to that server and using the second in IB mode connected to the switch until I’ve worked out all the kinks of IB mode or can find a reasonably priced 40GbE switch.

In point-to-point Ethernet (I don’t yet have a 40GbE switch, only IB), I can get 32Gbps (line rate considering encoding) with 2-4 threads in iperf3 and it scales well (spreads the throughput even over the threads regardless of the number of threads and in contrast to IB mode where one thread is 12Gbps and the others are severely throttled) and with 40GbE p2p, I’ve seen multi-GB/s transfers via NFS from my NVME drives with little tuning beyond that already in place for 10GbE (these machines are all on a 10GbE switch as well).

2 Likes

Hey @cekim, thanks for your feedback!

Couple things that jump out at me:

I don’t know the details of your database use-case, but generally, database operations are going to be a lot different than sequential media file read/writes. Specifically, they’ll favor IOPS over sequential read/write which is the opposite from my case.

Also, have you adjusted your block size to match your database? I have heard that this greatly increases performance.

As noted above, ZOL doesn’t support TRIM yet, so that could be hurting your SSD performance…

Are you using ZFS or hardware RAID? (or something else?)

I experimented with XFS, EXT4 and ZFS. I’m using ZFS now and planning on sticking with it. I have an asus hyper x16 nvme and I’m working on getting x4x4x4x4 bifurcation setup on that server (Asus Z10PE hides this, Asrock’s equivalent EP2C612 exposes this option in the BIOS by default, so Asus BIOS needs some cough adjustment). I’m going to use that to provide SSD cache to ZFS in the form of 4x960pro 512G NVMe.

With the LSI card and 850pro’s, it saturated at about 6 SSDs producing 2500-3000MB/s in extremely optimized test-cases. More typically, it would top out at 1.5-2GB/s for bursty access in raid0. There was a little difference between xfs and ext4, but nothing huge.

I saw similar behavior of the high-point before moving that over to the HDD array. They both delivered as promised and could saturate the 10GbE link, but then the 960pro showed up and did roughly the same with a single stick as 6-8 SATA disks.

I wasn’t trying to suggest the SATA SSDs performed poorly, they scaled pretty much as advertised up to the limit of the HBA and easily saturated a 10GbE link with sequential, large block access. Just pointing out that now that 1,2,3GB/s NVMe are out there, 6 or 8 SATA SSDs no longer make much sense if its just speed you are after (space is another matter given cost/GB). Raid1 2xNVME or Raid0 and backup or just a single 2T stick and call it a day with a much simpler setup.

1 Like

In my case, the storage requirements aren’t negligible, just not what I’d want to use spinning drives for. Currently I’m planning on 2 RAIDZ2 arrays of 12 500GB 860 Evos (~10TB of space) with 64GB of RAM. I’m getting a good deal on a used box at the price of having to settle for a SAS-2 backplane and HBA, so the throughput is going to be pegged at 2400 anyway.

I’m not committed to the RAIDZ2 config, but I figure striped mirrors would be overkill for the SAS-2 bandwidth. If I can get respectable speeds out of 2 bonded 10GbE lines, then I’m good.

3 Likes

I think you’ll end up with a nice and fast setup doing that provided your controller can keep up(older controllers were generally built assuming they’d never see something as fast as an SSD, so they can have surprising bottlenecks).

The question is whether your speed/cost/size/app performance trade off is optimal? Also whether you care about optimal? And then how much you are willing to invest in time for experimentation? When it comes to disk space for me, whatever number I think I can live with in trade for speed eventually becomes too little. Whatever tolerance I think I have for failure also understates the pain of recovery from failure that happens a lot with disks, though more so with spinning, at the worst possible time.

I have to qualify all that with the observation that getting to a caching and redundancy setup that I am satisfied with has been a long process that really isn’t complete. I have learned that you can produce a high performance setup with fast nvme and/or SSD as cache, backed by a redundant array for size and security.

BUT:

In terms of absolute dollars to produce the headline 2GB/s round number you are headed for, it will likely cost as much if not a little more. It wold potentially have significantly more space (12 spinning 2T or 4T with 2 nvmes for cache wold be 20 or 40 TB respectively with 2-drive fault tolerance).

2T disks are not going to produce per unit throughput as high as 4 which won’t be as high as 8 of course… but 150is a decent round number per unit (reflecting the difference between inner and outer tracks and 2T vs 8T rates). I do know 10-12 HDD in raidz2, particularly with compression enabled can deliver 1GB/s+ throughput. So your sustained sequential is that with nvme front end delivering much more for cached entries and write.

Once you’ve gone past 4 drives, you can start to do some nifty things with such arrays.

1 Like

Why NFS and not samba ? Since 4.? it should support RDMA (SMB Diret) on connectx3 and above.
Even Connect IB has SMB Direct support.

But i have to admit that i didn’t get to test RDMA on Ethernet.
On IB it is night and day vs IPoIB (captn obvious).

Guess i gona join the 40GbE club in a month or so : )

All linux on this sub-net, so I didn’t even ponder using SMB. MSFT has a long track record of terrible I/O, so there is no associativity in my brain between high-throughput file systems and windows anything. :wink:

RDMA + NFS doesn’t look too difficult, I just hadn’t had the cycles to sort it out given that I am having issues with getting IB ports to auto-configure and stay at full speed.

I got things working point-to-point via 40GbE and setup the IB switch on secondary ports for experimentation, so once I sort out getting everything running a 4x10Gb rather than 4x2.5Gb through the switch, I can try again.

When I forcibly set the ports to 4x10Gb they stay there, but reboot and they are back to 4x2.5G… Well, some of them… Others are always at 4x10G, so I need to go through the exercise of swapping cables to see if its just a bad cable (though none kick out errors, they just don’t auto-negotiate to 4x10 aka: 40Gb) for some reason).

These cards (mellanox 354A VPI) with the latest firmware auto configure well with ethernet - given any excuse, they will switch over to ethernet and 40Gb. IB less so, it appears to want manual config for that.

1 Like

interesting.
thumbs up for everything linux, i wish i could do that too.

turns out, thouse cards don’t support SMB Direct, at least not according to their product brief.
And interestingly those cards are listed as 56GbE capable with Mellanox switches.

What switch and cables are you using ?

And why would you want to use those cards in IB considering less software hassle on Ethernet and the same capabilitys ?
If you want to mainly use IB you might want to take a look at connect IB cards : )
I think i know a US seller that is cheap and tricky to find.

If someone went crazy right now…

Patriot Ignite 480G SSDs are on sale for $95

1 Like

Oops - missed this…
IB because I had an IB 40G switch handy. No other reason. 40G Ethernet switches are not cheap.

Ethernet IS definitely easier and preferable.

2 Likes

No problem,
what switch is it?

There are “cheap” emc fdr switches out there who are only infiniband but someone figured out how to flash those to MLX os with Ethernet support.

Is it one of those?

  1. Not aware of a means of doing a soft conversion to 4036e.

well that might be right.
i meant this post: https://forums.servethehome.com/index.php?threads/beware-of-emc-switches-sold-as-mellanox-sx6xxx-on-ebay.10786/

For the 60XX series.
Do have a SX 6018 Sitting around …

Wow you guys, thanks for the info. I’ve been at this for a couple years and this thread sums up my experience. In my brief moments of lucidity (when not dealing with Microsoft issues) I’ve come to the conclusion that the following may be best for a FreeNAS ZFS setup.

RAID6/ZFS2 with 4-8x12TB SAS drive backed by NVMe or Optane, dual 10Gbe NICs and about 64GB of RAM. If you get an 8 Bay Server with that setup that should cover both a Photography and Video server.

For VM hosting, a similar setup, except in 4x2TB NVMe/Optane in RAID 10, with 2-4x12TB SAS in RAID6/ZFS2 as backup/bulk storage. Use iSCSI of NFS for connectivity to the Compute cluster.

I am still debating whether enterprise class SSDs are an absolute must. Is it safe to go with a WD Blue, an intel m.2 or Samsung pro, or does it matter?

Does this sound optimal? I have yet to see why this wouldn’t be the way to go. Thoughts?

This might get closed because it is pretty old. If you’re interested, I went a different route and built a pure SSD NAS. I am running it just as storage and only use NFS and the apple protocol. Currently it is almost fully populated with 10 Crucial MX500 2TB SATA drives. And I’ll probably switch the RJ45 for SFP+ since I now have a switch in the mix.