I have OpenZFS 2.1.5 running on an Ubuntu 22.04 system that I use to host a handful of virtual machines via KVM. Each VM gets a dataset and a VM file contained therein.
The pool runs on a mirror of two Samsung PM983 3.84 TB NVMe SSDs.
On Monday, the largest and most active of my VMs (running a XenForo forum site) hung. I reset the VM, but it would not power back on due to the following error:
libvirtd[1914]: internal error: qemu unexpectedly closed the monitor: 2023-05-08T23:10:55.259267Z qemu-system-x86_64: -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}: Failed to get "write" lock#012Is another process using the image
No, nothing was using the image. lsof reported nothing locking the file. I ended up having to reboot the entire host to resolve this. The VM fired up fine after a host restart.
The following morning (Tuesday), I rebuilt the forum’s search index (elasticsearch) because it had not started up with the VM. It wasn’t set to autostart, so no big deal there.
Then on Tuesday evening, I started to get reports of the site again having issues. This time I could actually log in since the system was not hung, but the database was in a restart loop:
May 10 03:01:01 kernel: [98042.450194] blk_update_request: I/O error, dev vda, sector 1329356896 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
May 10 03:01:01 kernel: [98042.450802] Buffer I/O error on dev vda2, logical block 165894156, async page read
May 10 03:01:01 kernel: [98042.452217] blk_update_request: I/O error, dev vda, sector 1329356896 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
May 10 03:01:01 kernel: [98042.452892] Buffer I/O error on dev vda2, logical block 165894156, async page read
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [Warning] InnoDB: 10678272 bytes should have been read. Only 2338816 bytes read. Retrying for the remaining bytes.
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [ERROR] InnoDB: Operating system error number 5 in a file operation.
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [ERROR] InnoDB: Error number 5 means 'Input/output error'
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
May 10 03:01:01 mariadbd[441371]: 2023-05-10 3:01:01 0 [ERROR] InnoDB: File (unknown): 'read' returned OS error 205. Cannot continue operation
May 10 03:01:01 mariadbd[441371]: 230510 3:01:01 [ERROR] mysqld got signal 6 ;
May 10 03:01:01 mariadbd[441371]: This could be because you hit a bug. It is also possible that this binary
May 10 03:01:01 mariadbd[441371]: or one of the libraries it was linked against is corrupt, improperly built,
May 10 03:01:01 mariadbd[441371]: or misconfigured. This error can also be caused by malfunctioning hardware.
May 10 03:01:01 mariadbd[441371]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
That is no bueno for sure. I/O errors inside the VM. So I check the host and the ZFS pool and I see this:
zpool status -v
pool: VMSTOR
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:16:58 with 0 errors on Sun May 7 00:46:59 2023
config:
NAME STATE READ WRITE CKSUM
VMSTOR ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-SAMSUNG_MZQLB3T8HBLS-00AAZ_S5GUNE0N121688-part3 ONLINE 0 0 1.58K
nvme-SAMSUNG_MZQLB3T8HBLS-00AAZ_S5GUNE0N124416-part3 ONLINE 0 0 1.58K
There are checksum errors on both drives and data corruption in the VM’s image file. I have a backup of all the datasets via ZFS send/receive to a separate server, but the sync was behind due to the data churn from the events the day before. I have more recent snapshots on this pool, but I’m not sure when the corruption started so I’m not sure how far I need to go back.
Any thoughts on what might have caused this, how I can find out when it started, and/or how to prevent this from happening again?
SMART looks clean on these two drives, and the other VMs in the other datasets don’t seem to have any problems. I’m surprised a ZFS mirror could crater this hard. As Wendell said in the RAID video a while back, ZFS has “paranoid levels of paranoia” around data integrity.