Mystery Writes to Harddrive

Hi,

I am very new to self hosting and trying to learn, so please let me know if I missed some critical information.

For a while now I have been looking at something strange on my server:

I am running Proxmox root on a zfs mirror with some VMs on it. All of the VM disks for the VMs are zvols on the first zfs mirror.

The Ubuntu server VM has two VM disks: one for system things on the first mirror and a zvol on a second zfs mirror, only for data.
The data zvol is formatted as ext4 and is mounted within the file system. The data is basically photos and stuff so it does not change much (some hundreds of MB per week) and is shared to the network via samba and also updated via syncthing.

But if I check the diskusage in proxmox I see the following:

So the disk seems to run “full” until the weekly fstrim cuts it back down again. I thought this would be normal if you have a lot of writes, due to zfs being a COW filesystem. But I am not updating data on the drive, so I do not understand where the 100+GB per week of extra data is coming from?

Here some details:
on host

zfs list
NAME                                     USED  AVAIL     REFER  MOUNTPOINT
entropool                                564G   358G       96K  /entropool
entropool/vm-100-disk-0                  551G   358G      319G  -
qm config 100
agent: 1
boot: order=scsi0;net0
cores: 4
memory: 8046
name: fred
net0: virtio=FA:3D:FB:43:C3:72,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
parent: before_installing_3700X
scsi0: local-zfs:vm-100-disk-0,discard=on,size=100G
scsi2: entropool:vm-100-disk-0,backup=0,discard=on,size=900G

on guest

# my entrodota mount
/dev/disk/by-uuid/123456 /mnt/entrodata ext4 defaults 0 0

I would be grateful for any points in the right directions/explainations :slight_smile:

I am just giving this a small bump, maybe someone has some idea how to further investigate? :slightly_smiling_face:

When a trim command seems to free space then the graph seems to be reporting raw allocated data on the disc and not logically allocated data. Meaning your guess about this coming from ZFS being a copy on write filesystem is probably true. When data is changed a new record at the beginning of the free space is allocated and the old block dereferenced. On a SSD this means the block is marked as free but not actually erased. The erasure of the actual data is done by the trim command. I do not know what you do with your system, but I am guessing your average weekly change rate on the pool is what you are seeing being freed. But this is just a guess, I don’t know for sure since I am not using Proxmox.

2 Likes

Hi Sapiens,

Thank you for your reply. I should have enable email notifications, I did not see your email reply until now.

Yeah, this is what I am thinking as well. And I would accept that for the drives which have the VM disks where the operating system resides on. So I would expect some sort of data written/erased to it, due to logging etc.
But I don’t understand it for the second one, which only has data like photos which basically never gets touched. Just a VM disk which is ext4 formatted and mounted by the ubuntu server so syncthing (in a docker container residing on the other disk) can serve the data.

I am getting really worried, I checked the SMART values of terrabytes written on the data only disk and it went up from 8.8TB to 9.7TB in 21 days. 2 extra TB on the OS disk in the same time, which I could “acceppt”. I think this is way too much and will kill my SSD pretty soon. :confused:

In the case that you are not really writing much data to the disks the usage seems excessive, I agree! The question is why, though. I do not know of any viruses, trojans or other malware that would write a lot of data to infected systems. Maybe you should also open a thread on the Proxmox forums to ask for help there. It might by something wrong specific to Proxmox. I would backup my data, run a ClamAV or other AntiVirus over the backed up data and reinstall the system. It could be possible that this fixes it, or it could occur in the very same way again. This is the point where I am wildly guessing, though.

Alright, thank you.

At first I thought it might be syncthing hashing over files, but it should only do that on changed files and that would additionally only reading (and I guess stored in the docker container), so I really am confused.
iotop also looks fine.

edit: I just saw that you could iotop into “cumulative” mode. So I will let it run in screen overnight, and see if something comes up.

1 Like

Alright, here is 23hours:

This would explain the data written on in the docker containers (/config), which I kind of accepted to be “necessary”. I can ask in the syncthing and home assistant forum if I could reduce that somehow, because this might be logging (like `systemd-journaldd).
It is still a mystery who and why is writing to the data disk. :sweat: :smile:

1 Like

I would most certainly check the syncthing setup now. Check your Dockerfile! At this point my sixth sense is telling me your configuration is working differently than you believe. I mean you are looking for a heap of mystery data that gets written to the data disk and you just now found out syncthing is writing a heap of data each day.

Start with sudo docker info and look at the line containing Docker Root Dir: <path to directory>. Then make sure this directory is indeed mounted to the root partition and not data!

This all seems to be fine:
docker-compose.yml shows

 volumes:
  - /home/user/syncthing_config:/config
  - /mnt/entrodata:/entrodota

With docker info showing Docker Root Dir: /var/lib/docker
and fstab

/dev/disk/by-id/dm-uuid-LVM-123 / ext4 defaults 0 0
/dev/disk/by-uuid/456 /boot ext4 defaults 0 0
/swap.img       none    swap    sw      0       0
# my entrodota mount
/dev/disk/by-uuid/789 /mnt/entrodata ext4 defaults 0 0

So /var/lib/docker will be mounted under / and should be fine. But the /config from syncthing is mounted under /home/user/ which is also fine.

I don’t quite understand your disk layout. Could you give me a zpool list -v and say which disks are getting the writes?

The first thing to do is on all your ZFS pools (and datasets) check what atime and relatime are set to, if atime is on that is a big culprit of loads of pointless updates to metadata everytime synthing looks at anything. relatime shouldn’t usually be a problem, but sometimes it can be. It’s best to set atime=off.

#The following completely turns off atime
zfs set atime=off youpool
#The following combined uses relatime instead of atime: https://github.com/openzfs/zfs/issues/2466
zfs set atime=on <pool_name>
zfs set relatime=on <pool_name>

However, both synthing and proxmox are know to be very noisy with their writes. I’m not really at all familiar with trying to silence them though. Also not familiar with trying to drill down and figure out the culprit.

Hi!
Yeah sure:

NAME                                                 SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
entropool                                            952G   582G   370G        -         -     2%    61%  1.00x    ONLINE  -
  mirror                                             952G   582G   370G        -         -     2%  61.1%      -    ONLINE
    nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4GJNF0N111967      -      -      -        -         -      -      -      -    ONLINE
    nvme-SAMSUNG_MZVLB1T0HBLR-00000_S4GJNF0N111971      -      -      -        -         -      -      -      -    ONLINE
rpool                                                952G   480G   472G        -         -    30%    50%  1.00x    ONLINE  -
  mirror                                             952G   480G   472G        -         -    30%  50.5%      -    ONLINE
    nvme-eui.0025388101b98f77-part3                     -      -      -        -         -      -      -      -    ONLINE
    nvme-eui.0025388101b98fd0-part3                     -      -      -        -         -      -      -      -    ONLINE

Both of them getting lots of writes, but one of them is getting twice the amount of the other. entrodata ->1TB vs rpool → 2TB per month.
entrodata only has one VM disk on it, where syncthing is doing the accessing/serving.
rpool has all the “operating” VM disks on it, like the ubuntu VM.

I read up on on atime.
Seems to me that this would be a good fix if it is indeed the case. But I am not sure, because inside the VM the

mount | grep entrodata
/dev/sdb1 on /mnt/entrodata type ext4 (rw,relatime)

is set to relatime.
Which is weird because iotop inside the VM shows these large amounts. Maybe I should set atime off in fstab?

Maybe I should additionally try to set atime=off on the zfs on the host as well?

edit:
I changed the zfs pool entrodata to atime=off (rpool was already off). I also tried to change it on the zvols which serve as the VM disk, but zvols have no atimes.
I also added noatime to fstab inside the ubuntu VM, rebooted, but mount still shows relatime

iotop showed a giant 5Gb writes in the first few minutes of syncthing, so I guess that did not work and I have to figure out how to remount properly.

edit2: Error in fstab, noatime at the wrong place. :smiley:

I am not getting what this is doing at all. You mount the disk /dev/disk/by-uuid/789 to /mnt/entrodata. Why is this disk specified as ext4 and the mountpoint called entrodata when I though the writes go the the ZFS pool entropool. You don’t need entries in fstab to mount ZFS datasets. The only reason to use entries in fstab with ZFS is if you would use the legacy mount option of the datasets, but those would look differently. I have no idea what I am seeing here to be honest.

The zfs is on host providing zvols as vm disk. The guests fstab mounts those vm disks, not knowing anything about zfs.

This is the only way I know to pass through vm disks to a guest. If you know how to directly mount the host zfs as guest, then let me know!

Alright, so 22hours later the atime=off and noatime did not change anything.

I guess I just have to live with those giants writes. Even home assistant writing 1.6Gb per day. I can drill into syncthing if I could do some logging, maybe it is actually writing that much, but I doubt that. At least I am not getting that amount of network traffic.

Is this a typo? Is it present on the actual machine?

You mount the pool at /entropool. Can you please give me info on the path to access the ZVOLs.

The information is in post starting this thread.

Maybe your confusion is that the vm-100-disk-0 (on host) has just a /dev/disk/by-uuid/123456 number in the fstab (on guest).

I am finally understanding your config. I am not seeing anything wrong with it, tough. Only tip I have left, is set discard parameter to unmap.

Thank you for trying to help and thinking it through!
I do have discard=on, which seems to do be the same:

--discard=discard
           Control whether discard (also known as trim or unmap) requests are ignored or passed
           to the filesystem.  discard is one of ignore (or off), unmap (or on).  The default is
           ignore.

https://manpages.ubuntu.com/manpages/focal/man8/qemu-nbd.8.html
(But I am not 100% sure about that, but that the thing proxmox should be using.)