Hi All
I am working on a proof of concept for a micro hosting requirement. The client requires hosting of many small VM’s, about 30-100 of 2GB of RAM each and 30GB of storage each. CPU and disk Load generated by these VM’s are very low.
They are looking to retire their two old Dell R540 servers due to very high datacentre power costs.
rather than buying a single server to replace it and having that single point of failure, we are thinking of going a homelab-ish route of using micro desktops as servers in a cluster. We cant seem to find clustering hardware that is not total overkill for this requirement, or terrible expensive relative to the resources you get.
We are investigating the option of setting up a cluster of Intel NUCs like these ones: https://www.intel.com/content/www/u…nuc-12-pro-kit-nuc12wshi5/specifications.html
We like that it has a 2.5G nic with Vpro. We will likely Vlan public traffic onto a Vlan interface setup in Proxmox and use that interface for the management lan and backups to external storage.
We are also wondering, is since each NUC has two thunderbolt 4 port, would it be possible to build a 10gig ring network with Thunderbolt cables and avoid having to buy expensive thunderbolt to 10gig adaptors. This Ring network would likely use OSPF like apaird’s video here Fully Routed Networks in Proxmox! Point-to-Point and Weird Cluster Configs Made Easy - YouTube
We are looking at 3 or 5 nodes initially, up to a maximum of about 9 at most if the concept works very well. if we need more than that, we can probably justify buying a supermicro 4 node server to replace it.
Using a SATA SSD for boot and a M.2 SSD for VM data. We know that there is no disk redundancy, but the requirement can tolerate 5min of downtime and a minute or two of data loss in the case of a node failure. We are wondering what would work best for storage.
We are wondering if it is viable to setup the M.2 SSDs in a ceph cluster with 1 OSD per node. We will be using something decent for the M.2 SSD, but at 2.5Gbit or 10gigabit networking, I don’t see the SSD being the performance bottleneck. The shared storage nature should allow for migration and HA in the case of node failures or maintenance. I know general practice is to use at least 4 OSD’s per node, but I am not certain as to the thinking behind that. I have seen people using single OSD nodes in their lab environments.
Anything less obvious that we may be missing, or is someone using hardware other than Intel NUC’s for a similar purpose?