I have bought 10x AMD Milan 7713, 256GB Ram, 12x Samsung 1TB NVMe drives (RAIDZ2 pool?) and Mellanox ConnectX 4 NIcs. I want to host these servers in a data center and run Kubernetes on them.
I have watched a couple of videos on youtube and I still don’t know which OS to use. For this hardware setup, it seems like a good idea to go for TrueNAS, but will this work for an Production eviroment?
I hope I have provided all the information needed to answer my question.
Not yet, but my concern is more about getting the full speed of the NVMes in RAIDZ2 and RDMA support over network. It’s my understanding that you need the latest kernel for that. I am not 100% sure on this.
That would probably be the better route at this deployment size. ceph definitely has some overhead, but at this scale, it’s nothing those systems can’t handle.
Ceph is a whole rabbit hole in itself. I haven’t worked with it myself in ~6 years, but I could definitely point you in the right direction if you have any questions.
Since this will be a production setup, I would rather stick with what I know. If zfs will not give me the performance that I expect, then I might have a look at Ceph. Thank you for the feedback anyway.
The problem with zfs is that its a local filesystem. Kubernetes clusters expect some sort of distributed block filesystem for persistent volumes, and that’s where the challenge comes in. Obviously there are solutions, but ceph sort of does the heavy lifting in merging all that storage together.
Well, technically, kubernetes allows you to run with just local storage, but its not the most robust solution. Have you considered how you will manage high availability of storage?
My biggest consern is the networking side of things and I do not know yet where to post my question on this forum. I have joined another youtubers’ discord and will ask there as well.
Regarding the mellanox gear, you could post in #networking and you might get some better response. I’m not familiar with that gear personally, so I can’t help myself, but the networking category would likely alert people who are more experienced with it.
Have you looked into Openshift? Potentially on oVirt? It comes with a simple installer, web dashboard, and a somewhat easy to configure Ceph cluster with Openshift Container Storage. Then your OS install would simplify to either RHEV/oVirt nodes for an automated install or RHCOS for a manual install