I have been researching possibilities creating multiple server cluster, that is built using identical hardware (same spec servers) or its built using what ever is available.
Wendell made video about Level1Tech Tyan 4 node 2U server and he mentioned, it is possible to make them all be part of one logical unit.
Now the question is - what software framework is needed for it and the back-end for it?
Please share your insights because on this one I am stuck.
For back-end, I assume have to use Ethernet, so in theory 100GE connectivity could be what I am aiming at. Or there are other options or this one does not work?
Hmm, I thought Kubernetes is for containers, something that you run along the cluster framework for the hardware part.
For clarification as a very quick rundown:
I install all my servers, lets say using RockyLinux 9.1, then I setup management node and deploy agents on other servers? (it is on premises deployment using open source software)
Then, just put what ever task I want to run in container and just let the Kubernetes handle the distribution of this task?
I am sorry, I am new to this, never had a chance to work with containers and cluster, but now is it and just need general direction, where to dig for informationâŚ
The process of containerization is important as you will (have to) transfer your task into a âserviceâ. This works generally very well for things like web apps, is more difficult for things like databases.
Once that is done, Kubernetes (or k8s in short) can deploy containers across nodes, maintain failover capabilities, automatically scale resources up by launching additional containers, etc.
If all of this is quite new to you I would recommend starting with docker as a means to build containers, get familiar with this technology.
After that look at docker-compose as a way to build more complex applications where different container types depend upon each other.
k8s is the icing on this cake that adds automatic deployment and all the other goodies.
K8s is a beast to admin in its own right. I like to recommend Hashicorp Nomad as well for clustering, it can handle a lot more job types than just containers. If you havenât worked with containers at all, setting up K8s is going straight to hard difficulty. Docker Swarm is also a much simpler foray into clustering to get your feet wet.
Yes, I need to figure out a way, how to admin clusterâŚ
Just quick question, how is Nomad from administration perspective? It has some sort of Web interface or�
Anyway, got a lot of material to research and it sounds like a lot sleepless nights up front while dig thru this
I think the type of cluster you want to build really depends on the workload. If the goal is to host service then Proxmox/K8/k3/swarm make more sense. If the idea is to do âsuper computerâ type research work then probably something different.
Main workload is bioinformatics pipelines, need raw CPU and RAM (GPU and FPGA is optional) power, so it seems containers are way to go. Its like HPC cluster, something like Frontier, but at much much much smaller scale.
And no virtualization under it, because all servers are physical, running the same OS, just the specs for the test bench setup is like whatever is available (Xeon and EPYC random generations and models etc etc)
The HPC clusters that you are referring to run applications coded to run in parallel using MPI (Message Passing Interface) so that the performance can scale by adding more nodes and they all work work together to complete the same job as if all of the servers were combined into one.
Kubernetes scales differently, you are running multiple instances of the same app but it is load balanced to distribute the load across multiple servers so you will never truly have one job request exceeding the capacity of the node it is running directly on.
If your workload supports MPI then you can use a cluster manager to manage the systems. Here are some of the more common options:
OpenHPC is a framework that puts all of the main components together that you need to build a cluster. The easiest option would be to go with the Rocky8.6/Warewulf/Slurm variant.
Bright Cluster Manager is easier to get up and running and has a nice web interface that makes it more user friendly for beginners. I would also stick with the Rocky/RHEL variants here too. The only downside is that the licensing costs are not cheap but if you never plan to go beyond 8 nodes they have a free option called âBright Easy8â. You will need to make an account on the BrightComputing customer portal site to access the Easy8 downloads.
You can run the head node on a separate system that is less powerful or even a virtual machine connected to the provisioning network since you generally donât want your head node to run on a node that will be running your workloads.
For the MPI network, systems like this normally use a dedicated InfiniBand or OmniPath fabric as their high speed interconnect to cut down on latency but 100GbE will work fine.
Yes, I want to stick to Rocky/RHEL as backbone, because it has awesome documentation and strong community behind it. Additionally been using RHEL and CentOS in production environments - its rock solid platform.
As I am working on this case, the applications/pipelines, that need this HPC will be inside containers just for simple reason - to make it easy to update and deliver.
Sadly I am not sure my workload supports MPI, but definitely something to explore as its something new for me.
Regarding connection backbone, I am familiar with Ethernet and donât have much experience with InfiniBand or OmniPath fabric, so that is why I want to stick to Ethernet.
Also another one to look at is Apptainer; itâs HPC oriented containerization infrastructure (which in my opinion is an oxymoron⌠there is a reason SLURM is what is used in the top500).
Containerization, especially the ones created for webservices originally, leave performance on the table when used to run HPC code.