Hi, I’m upgrading our server infrastructure in our company. We are a small company that partially uses Windows RDP service for employees that work from home and have some of our products that run in k8s. Databases ran up on separate VMs with replication on database servers level. Right now, we have four small servers in a standalone configuration. We have two servers with Proxmox, one with VMware ESXi and one with Windows server.
I’m planning to upgrade our server infrastructure to the Proxmox cluster of three servers. I will use one of our old servers that are most new and powerful, Dell R720XD. The Role of this server is to be just the third server in the cluster.
Each server will have:
- Two SATA SSD in ZFS Raid1. For OS.
- 6TB*6 HDD in ZFS RaidZ2. For slow data.
- 1TB*2 NVME SSD (Seagate FireCuda 510), LVM Raid0 for Ceph, and L2ARC for ZFS(maybe).
- Separate network for cluster communication via SFP+ (Daugther network card Intel X520 + Mikrotik CRS309-1G-8S+IN)
I’m aware that Seagate FireCuda 510 is not for enterprise or server load. My current statistics:
- 100TBW for one SSD drive in 6~7 years.
- 70TBW for another SSD drive in 3 years.
- We consume about 100GB of RAM across all servers.
I don’t want to use ZFS on NVME drives because they are not for enterprise load and could wear too quickly. My goal is to run a new cluster system for five years. That is why want to try to use Ceph on NVME. I will use this mount point to store data like databases or critical OS drives if HDD storage is too slow.
I’m aware that Ceph on top of ZFS is not the best solution, if I wrong, please correct me.
I have about three months to set up this cluster and run some tests before switching load to this cluster.
I will use a converter from PCI-E to M.2 and add data protection on power loss. We have APC UPS for 3KVA. We plan to use it not more than 25-33%. I already add a script that checks battery status, if it too low it sends a command to servers to shut down properly.
Configuration in short :
Two main servers Dell R730XD (E5-2680V42, 256GB, 2256GB SATA SSD, 6*6TB HDD, Network card Intel x520+i350, Dual 1100W PSU)
Third servers Dell R720XD (E5-2680v22, 512GB, 2256GB SATA SSD, 6*6TB HDD, Network card Intel x520+i350, Dual 1100W PSU)
My questions are:
How bad to mix ZFS and Ceph in one cluster?
Could you share practices or a different solution for my workaround?