Consider the following: I have a Proxmox server with multiple VMs and ZFS storage which also serves as my NAS through NFS (and soon SMB) shares. So far, all good, I’m very happy with this setup.
However, I am running low on storage and while planning the upgrade I started thinking about how I currently “waste” a lot of space space by over provisioning virtual disks and how I could better utilize storage space.
My question now is how to properly store and scale application data (not sure if this is actually the correct term, please correct me if not). With application data I mean for example the data my users store on my nextcloud instance, the time series data from my influxdb instance, or simply my media collection. The common characteristics here is that they are continuously growing.
Let’s take my nextcloud instance as an example, I currently set it up as a VM with two virtual disks, one as boot drive and for data. The data disk is currently 500G and closing in on capacity while other VMs have >100G free space. I now could either expand the virtual disk (using for example Proxmox’ built-in expand disk feature), create a new bigger virtual disk and transfer the data, or move the data to the Proxmox host into a dedicated dataset and share it back into the VM via a network share. All options seem possible to me, but I am unsure about what is the proper way to handle such situations. I bet this a very common situation in real production systems, for example if there is a database in a VM that is continuously growing? How is this done by professionals (I am a mere sysadmin hobbyist by night), what approach is supposed to be the “best practice” here? Did I miss anything super obvious?
The idea of storing data on the host and providing it to the application VMs via a network share seems appealing. To me, the benefits are that a) I only have to care about scaling a single point (e.g. add another VDEV to my pool), b) I don’t have to estimate the growth rate of every applications data requirements, c) no space is “wasted” by over provisioning a data disk for one specific application, as the storage space is shared by all applications. The downsides are d) more complicated setup, e) more complicated backup (can’t just press backup in the GUI, although I already did setup up ZFS replication with another TrueNAS box for non-vm-image data sets, so that would work), f) overhead through the additional network step (even though speed probably won’t be an issue as its on the same host), ( g) my gut feeling screams security, but I actually can’t pin point it ?). Did I miss any?
So, how are you all storing and scaling application data? Any help or insights are very much appreciated