I’ve been hesitant to rely on Docker containers for “production” (long-term use in my homelab) because default instructions always seem to assume you’re OK with everything living on the host’s local drive. I had to figure out how to set my Docker server up before I relied on it for anything important and I had 3 requirements:
Point Docker containers to network storage for both data and config/database. This is usually easy but different for each application.
Backup information about the location where each application’s data/config is stored, and other commands, to redeploy all my services without manually rethinking step 1 for each one.
Be able to migrate or rebuild the entire docker host without remembering too many commands or specifics about every app I am running and where its data is stored.
This may seem obvious to some (and inefficient to others who have a better solution - let me know what you do!) but here’s my setup to build, and easily rebuild, my Docker server with everything on it. It’s not rocket science, I’m using Cockpit and Portainer exactly as intended to help manage the host and containers - but it’s worked well for me.
Create basic Debian VM, do basic updates/setup, install Cockpit.
Use Cockpit to install NFS support and map my NFS shares to mount points on the host - these shares contain folders for config and sometimes a separate data location for each application.
Install Docker Engine per instructions on their website.
Install Portainer CE per instructions on their website, tweaking the pull command so Portainer’s data directory is at its designated network location which in my case is mounted at /mnt/NFS/Portainer
First time setup - use Portainer to configure containers/stacks with variables like storage locations and other special setup on the compose/environment side of things.
5a. All config to pull, run, and point my apps to the correct storages is now in the Portainer folder on my network drive.
5b. If I’m recreating this setup on a new machine - Portainer instantly knows all the config I did in the past as soon as it’s installed. I can just click to re-pull and run each of my stacks, each app will automatically be pointed to its designated config/data location on the network share, and will pick up exactly where it left off.
I needed to play with Maproot and ensure a passthrough type ACL for apps to get the permissions to run databases etc on an NFS share. I tried to mount a network share to /var/lib/docker to store the actual images but I had problems - likely permission related. Re-pulling images when restoring/migrating to a new server works just fine for me though.
Looks good to me, here’s my aproach:
In my homelab I’ve docker running in LXC on Proxmox. This is fine since some time in 2023 even with ZFS. Something related to kernel 6.1.x. But less isolated than VM if that’s important.
Anyway, I’m doing all on CLI. Mountpoint /mydocker is mapped to an ZFS NVME mirror pool on host and snapshots on host are done with sanoid and backupped with syncoid.
For restore, there is a /mydocker/docker_start_all.sh:
docker_start_all.sh
#!/bin/bash
for d in `find /mydocker -maxdepth 1 -mindepth 1 -type d`; do
cd $d
echo "Folder: $d"
docker compose pull && docker compose up -d || echo "------------ FAILURE ----------------------------------"
done
Some data lives on HDD pool and also mountpointed to LXC and there bind mounted to needed containers in docker-compose.yml.
The LXC should not have any important data. But as I’m using PBS and have other containers with the same OS running, backing it up doesn’t consume much extra space, thanks to PBS deduplication. Therefore I also backup this LXC daily (this backup does not include the mountpoints of course.).
Don’t know if this is better or worse, at least it’s different
Those scripts and config files can all benefit from a git init in their directory root along and occasional git commit -am $(date -I) so you can roll back a bad change and you can review what you used to do.
Tools like etckeeper are a no-brainer once you’ve seen the gain, but they need to be in place from early on so that you can use the power of version control.
I don’t want to rely on my network or the NAS for the software running in the containers to function. If any of those stop working, the containers could be affected as well.
I run ZFS on the docker host, so my compose files and all path bound volumes are getting regular snapshots. Those snapshots are backed up on a NAS just used as a backup target.
Oh, and remember: Just because the data is on a different host you can just start using the stored data. Maybe your Docker Host fails so catastrophically that you have to rebuild it, but while crapping out the host half stored a database transaction to file on your NAS. Now you still have to repair that database, if possible.
The best solution would be to stop all containers, take the snapshot and start them again. Or you send a command to the database, tell it to sync all to disk, make the snapshot and send a resume command.
This is probably way more efficient on RAM, if not other resources too. It also lets you store Docker data on the same disks you’re using for other Proxmox storage. The main reason I didn’t do bind mounts and LXC is I run a Proxmox cluster, so a VM has live migration abilities. Also having the data locally stored on one host would make migration at all impractical - though with replication etc it probably could work! So it sounds like you have a very good solution, I just had my reasons to actually want my data on a NAS which pushed me in another direction
I’ve definitely considered the reliance on the NAS as a weak point, but ultimately there’s stuff that needs to be on a NAS due to the size and need for redundancy/backup - like Immich photos - so at least some of my services are going down if the NAS goes down. I’ve just accepted that if the NAS breaks, it will need to be rebuilt first. My NAS is Truenas so I do have snapshots to roll back to if things go haywire.
Replication of mount points is not possibe. Ceph may be a solution, but I think it has performance penalties. Never tried, just read some stuff and decided not to cluster
One way could be split up the storage. Having the important stuff stored in cluster with either replication or ceph and the big stuff on the NAS. Of course it depends on your situation and maybe you’ve already think of it and chosen not to.
This would also make the backup situation even more complicated xD
Snapshots are not a backup for disaster like lightning or fire, or when the power supply blows up all connected stuff. I assume you know, just wanted to calrify for other readers.