tl;dr: Given the drive specs below, how would you set up swap for Proxmox and its VMs on an all-SSD server?
I’ve got high-endurance NVMEs and SATA SSDs I’ll be using for RAID1 OS and RAID 1 VM Store pools, respectively, and after working with an endurance calculator, I’m fairly confident that the default swap settings in Proxmox wouldn’t kill them for years given my anticipated GB/day writes.
But I also know that all the optimization guides mention minimizing swap writes to get the most life out of your SSDs.
Do nothing and use the default swap settings and, unless something unexpected shows up when I’m monitoring disk usage, enjoy years of service on the OS and VM store drives.
1.1. According to the SSD endurance calculator, even if I somehow wrote 30 GB a day to my OS drives, they’d still have ~70 years of endurance.
1.2. With the same calculator, my SATA SSDs, at 30 GB/day of writes, should last about 54 years.
Completely disable swap in Proxmox and all VMs; or
Set swappiness to 1 for both Proxmox and all VMs.
Thanks for any advice.
These are my drives:
BOOT: 2x Sabrent Rocket 4.0 NVME - 500 GB
Max Seq. Read 5000MB/s
Max Seq. Write 2500MB/s
TBW 850 TB
Warranty 5-year
VM Store: 2x Samsung 870 EVO SATA III SSD - 1 TB
Sequential read
Up to 560 MB/s
Sequential write
Up to 530 MB/s
TBW 600 TBW
Warranty 5-year
I personally try not to rely on swap, on nvme or otherwise and prefer to rely on pagecaches, oom killers, systemd and software itself invoking mlock as necessary (and being relatively deterministic from a resource perspective)
Only form of swap I use on low ram raspberry pi like systems or cloud VMs is zram: ZRam - Debian Wiki, because these systems often have 512M or 2G or less and they’re running stuff where ram usage is highly dependant on the workload at a given moment and their storage is usually super slow relative to their CPU performance.
It gives the device a bit of a buffer before it runs out.
For servers, I prefer making ram usage deterministic… ie. if a server process is using too much ram, kill it and restart it.
Here’s documentation on how to set limits for particular services… so that you don’t run out
As a general principle for setting actual values, I’d look at memory footprint of a freshly booted system with nothing on it running and I’d add 20% safety factor. I’d make sure sshd is never killed, but I’d make ssh user session slices first things on the chopping block for the oom killer (to prevent adhoc commands sucking in the ram).
Prometheus has a node exporter, and there’s also a systemd-exporter that have useful ram metrics that you can setup to record… and can then graph over time if you suspect regressions or if you suspect you’re running out.
I’ve got 128 GB of RAM on the host, with 8 cores and 16 threads (which might get upgraded to 12 cores, 24 threads).
I can’t put any more RAM in this thing, but for a home server, I think it should be more than adequate.
From what you said, I’m thinking I should start out with swap disabled completely on the host and VMs, and then just adjust memory up on the VMs if they show up as problematic in the logs?