I am very new to self hosting and trying to learn, so please let me know if I missed some critical information.
For a while now I have been looking at something strange on my server:
I am running Proxmox root on a zfs mirror with some VMs on it. All of the VM disks for the VMs are zvols on the first zfs mirror.
The Ubuntu server VM has two VM disks: one for system things on the first mirror and a zvol on a second zfs mirror, only for data.
The data zvol is formatted as ext4 and is mounted within the file system. The data is basically photos and stuff so it does not change much (some hundreds of MB per week) and is shared to the network via samba and also updated via syncthing.
But if I check the diskusage in proxmox I see the following:
So the disk seems to run “full” until the weekly fstrim cuts it back down again. I thought this would be normal if you have a lot of writes, due to zfs being a COW filesystem. But I am not updating data on the drive, so I do not understand where the 100+GB per week of extra data is coming from?
Here some details:
on host
zfs list
NAME USED AVAIL REFER MOUNTPOINT
entropool 564G 358G 96K /entropool
entropool/vm-100-disk-0 551G 358G 319G -
When a trim command seems to free space then the graph seems to be reporting raw allocated data on the disc and not logically allocated data. Meaning your guess about this coming from ZFS being a copy on write filesystem is probably true. When data is changed a new record at the beginning of the free space is allocated and the old block dereferenced. On a SSD this means the block is marked as free but not actually erased. The erasure of the actual data is done by the trim command. I do not know what you do with your system, but I am guessing your average weekly change rate on the pool is what you are seeing being freed. But this is just a guess, I don’t know for sure since I am not using Proxmox.
Thank you for your reply. I should have enable email notifications, I did not see your email reply until now.
Yeah, this is what I am thinking as well. And I would accept that for the drives which have the VM disks where the operating system resides on. So I would expect some sort of data written/erased to it, due to logging etc.
But I don’t understand it for the second one, which only has data like photos which basically never gets touched. Just a VM disk which is ext4 formatted and mounted by the ubuntu server so syncthing (in a docker container residing on the other disk) can serve the data.
I am getting really worried, I checked the SMART values of terrabytes written on the data only disk and it went up from 8.8TB to 9.7TB in 21 days. 2 extra TB on the OS disk in the same time, which I could “acceppt”. I think this is way too much and will kill my SSD pretty soon.
In the case that you are not really writing much data to the disks the usage seems excessive, I agree! The question is why, though. I do not know of any viruses, trojans or other malware that would write a lot of data to infected systems. Maybe you should also open a thread on the Proxmox forums to ask for help there. It might by something wrong specific to Proxmox. I would backup my data, run a ClamAV or other AntiVirus over the backed up data and reinstall the system. It could be possible that this fixes it, or it could occur in the very same way again. This is the point where I am wildly guessing, though.
At first I thought it might be syncthing hashing over files, but it should only do that on changed files and that would additionally only reading (and I guess stored in the docker container), so I really am confused. iotop also looks fine.
edit: I just saw that you could iotop into “cumulative” mode. So I will let it run in screen overnight, and see if something comes up.
This would explain the data written on in the docker containers (/config), which I kind of accepted to be “necessary”. I can ask in the syncthing and home assistant forum if I could reduce that somehow, because this might be logging (like `systemd-journaldd).
It is still a mystery who and why is writing to the data disk.
I would most certainly check the syncthing setup now. Check your Dockerfile! At this point my sixth sense is telling me your configuration is working differently than you believe. I mean you are looking for a heap of mystery data that gets written to the data disk and you just now found out syncthing is writing a heap of data each day.
Start with sudo docker info and look at the line containing Docker Root Dir: <path to directory>. Then make sure this directory is indeed mounted to the root partition and not data!
I don’t quite understand your disk layout. Could you give me a zpool list -v and say which disks are getting the writes?
The first thing to do is on all your ZFS pools (and datasets) check what atime and relatime are set to, if atime is on that is a big culprit of loads of pointless updates to metadata everytime synthing looks at anything. relatime shouldn’t usually be a problem, but sometimes it can be. It’s best to set atime=off.
#The following completely turns off atime
zfs set atime=off youpool
#The following combined uses relatime instead of atime: https://github.com/openzfs/zfs/issues/2466
zfs set atime=on <pool_name>
zfs set relatime=on <pool_name>
However, both synthing and proxmox are know to be very noisy with their writes. I’m not really at all familiar with trying to silence them though. Also not familiar with trying to drill down and figure out the culprit.
Both of them getting lots of writes, but one of them is getting twice the amount of the other. entrodata ->1TB vs rpool → 2TB per month.
entrodata only has one VM disk on it, where syncthing is doing the accessing/serving.
rpool has all the “operating” VM disks on it, like the ubuntu VM.
I read up on on atime.
Seems to me that this would be a good fix if it is indeed the case. But I am not sure, because inside the VM the
mount | grep entrodata
/dev/sdb1 on /mnt/entrodata type ext4 (rw,relatime)
is set to relatime.
Which is weird because iotop inside the VM shows these large amounts. Maybe I should set atime off in fstab?
Maybe I should additionally try to set atime=off on the zfs on the host as well?
edit:
I changed the zfs pool entrodata to atime=off (rpool was already off). I also tried to change it on the zvols which serve as the VM disk, but zvols have no atimes.
I also added noatime to fstab inside the ubuntu VM, rebooted, but mount still shows relatime
iotop showed a giant 5Gb writes in the first few minutes of syncthing, so I guess that did not work and I have to figure out how to remount properly.
edit2: Error in fstab, noatime at the wrong place.
I am not getting what this is doing at all. You mount the disk /dev/disk/by-uuid/789 to /mnt/entrodata. Why is this disk specified as ext4 and the mountpoint called entrodata when I though the writes go the the ZFS pool entropool. You don’t need entries in fstab to mount ZFS datasets. The only reason to use entries in fstab with ZFS is if you would use the legacy mount option of the datasets, but those would look differently. I have no idea what I am seeing here to be honest.
I guess I just have to live with those giants writes. Even home assistant writing 1.6Gb per day. I can drill into syncthing if I could do some logging, maybe it is actually writing that much, but I doubt that. At least I am not getting that amount of network traffic.
Thank you for trying to help and thinking it through!
I do have discard=on, which seems to do be the same:
--discard=discard
Control whether discard (also known as trim or unmap) requests are ignored or passed
to the filesystem. discard is one of ignore (or off), unmap (or on). The default is
ignore.