Hi, long time lurker, first time poster. I have a Home Lab that’s responsible for an approx 200TB NAS, lots of VM’s, home automation functionality, a davinci resolve render server, and crucially security cams including ai functions and 24/7/365 onsite recording and access control through out the property.
All of this has been working wonderfully for some years now however we are now at a point where we cannot afford to lose access control and security camera recording for an hour each time the NAS needs an update or some work done on it. The drives themselves are in a sas3 JBOD with dual path already setup using 2 separate HBA’S on the NAS server (as I knew that theong term plan was ultimately to go HA and this was an easy first step). Currently running TrueNAS with ssd write cache and Optane metadata log. This is all great and provides the necessary throughput and low enough latency to handle all the tasks discussed above while I’m editing 4k video footage directly from the NAS over a 25Gbe network link.
Now for the problem… Or at least the bit where I’m unsure how to proceed…
Obviously to move to HA for the NAS, it will require me to have at least 1 more NAS server, that’s fine, I have a few spare 2RU servers I can make work. What I’m not sure about is how to deal with the ssd write cache and Optane metadata log in a HA environment? Trying to figure out if I’m best sticking with TrueNAS or if it may be time to make the leap over to CEPH.
Any and all help or suggestions on where I should go to find the answers I seek would be most appreciated. I definitely don’t mind doing some research / homework but given the nature of the project at hand, this is something that I absolutely must ensure that I have worked out all the links before I deploy it.
This. Keep in mind ceph is a completely different beast and setting it up is quite a chore in itself. And you need 5 nodes minimum.
There’s probably ways to handle multi-pathing and choose which system is accessing drives. But I’d say, given that you have a write cache and metadata log, you can’t use that HA feature, because the drives are only connected to one system.
Thanks for the comments so far. OK, so a little more detail. When I say High Availability I don’t need 5 9’s+ across the board, when the NAS requires work of some kind, it’s perfectly OK if all it’s able to do is continue to handle the security cam, AI tagging and access control, and home automation. Noone needs to edit 4k video or stream movies over the network while it’s in a degraded state. Also, I live in an inner city area where space is at a premium and my current rack space is all I have to work with, so adding an extra 1u or 2u head server to essentially loadshare with the current one is feasible, 4 more for a 5 node ceph cluster (I’ve been told a 3 node CEPH cluster just won’t give the speed I need, particularly for live 4k video editing directly from the array) or another 4u jbod, just aren’t options I have available. Oh and yes, the jbod is full of factory reconditioned 16gb exos drives.
2 options that people elsewhere have suggested today is
Put the slog and metadata cache on end to end nvme or nvme over fiber / roce. This would then allow access of 2 servers to have access to the same physical resources, just not sure how nicely TrueNAS / zfs is going to play if neither server has exclusive access to the logs. And yes that solution adds latency, I’m not too concerned about about the ~25% latency hit to the slog as it’s generally only going to bite if wanting to write huge amounts of very small files to the NAS which isn’t a super common occurrence for our use case. The Metadata log on the other hand is on Optane, specifically because Optane latency is in the low nS range, attempting to push that over fabrics could really destroy its ultra low latency which was necessary to make this spinning rust array work for all the jobs I require of it. Replacing a significant portion of the spinning rust with ssd / flash is also not currently an option but may become more viable in the future.
The other option I’ve had pitched to me is that if the only core functions that are mission critical / actually require some degree of high availability are the security cam recording and associated AI stuffs, access control and home automation, perhaps I’m better off carving those functions out of the current NAS server and running a 2 or 4 blade cluster that’s only responsible for those functions. I do have a 2u 4 node server I could rearrange things and use for that purpose if consensus is that this is the best path forward.
There’s the Tyan Transport 4 node 2u servers (i.e. 4 servers in 2u of space) that could work for ceph storage, maybe even a hyper-converged cluster (if you add a 5th node, maybe a 1u node that you might already have).
That’s always a good practice anyway. Having a beefy server is cool and all, but if HA can be better handled by multiple nodes that don’t need to go as fast, then try that route instead.
If price is a concern, you can do a lot in 2U of space with a rack tray and some mini-PCs… wink, wink, nudge, nudge.
They won’t be as fast as epyc servers, but they’ll only suck up a few watts of power each. IDK how you’ll be handling the AI stuff (I assume your hypervisor will remain on the other box? - particularly because you need the GPU acceleration).
I think the concern was that the other NAS also handles VMs and when there’s an update, you need to reboot it. Well, that’s a problem that a real hypervisor can fix (no, I don’t consider truenas a hypervisor, even if it can do double-duty as 1).
I can’t believe I’m recommending proxmox again, particularly for HA (which in my own lab I had nightmares with after 2 nodes failed after a power outage and my cluster got @#$%ed). With proxmox, if you have 3 nodes (or just 2 beefy nodes and just a small “witness” / tie-breaker), you can easily have HA on them.
I think you might even be able to get away without the HA, with just 2 proxmox hypervisors if all you do is manually live-migrate VMs. Within proxmox, you have the option to migrate a VM and run it from another proxmox host (in the same cluster). When you do that, if you only have local storage configured, the disk will also be transferred over to the other box (assuming you have enough space). If you have a NAS or SAN, the other proxmox host just locks the VM paths instead.
To move a VM, it doesn’t take that much time. To move a disk, particularly large ones, even with ZFS it can take a while (from 30 minutes to a few hours, depending on the size - your limit is your read and write throughput and your network bandwidth). Not to mention the additional writes on your storage media that you do when moving a large dataset back and forth.
Which then raises the question… if you are going to live migrate your surveillance footage and its live storage, access control, home automation and whatever else to another node when there’s an update… do you need a cluster?
You could literally run that on a single node that never gets updated or rebooted (which I don’t prefer myself, but it’s an option). But, if updates are that important to you, then live migration makes sense. Your 2nd node doesn’t need to be nearly as beefy as your main box, as the video editing stuff can happen only on the 1st node.
You can literally do something like: run everything on node 1, keep your 2nd node powered off, power your 2nd node when there’s updates, update it, reboot it, live migrate your important VMs to node 2, shut down the non-critical ones on node 1, upgrade and reboot node 1, power back on the non-critical VMs on node 1 and move the critical VMs from node 2 to node 1 and finally, shutdown node 2 again. And the 2nd node also allows you to test the upgrade before applying it on the main host (if it breaks, you don’t upgrade until you know what broke and how to avoid that).
The advantage is that you keep your VMs up and save some power. Disadvantage is definitely the back-and-forth movement of massive amounts of data. But if you plan on keeping node 2 up and running 24/7, you might as well also make it serve double-duty as a backup server (if you don’t already have one - backup your important data folks!).