We are running a zpool with 6 drives, consisting of a stripe of 3 mirrors:
NAME STATE READ WRITE CKSUM
motherducker ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sda ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
sdg ONLINE 0 0 0
sdf ONLINE 0 0 0
When physically removing one of the drives, the pool hangs the next time zfs tries to sync:
kernel: āecho 0 > /proc/sys/kernel/hung_task_timeout_secsā disables this message.
kernel: qemu-kvm D ffff885b60ab8fd0 0 21248 1 0x00000080
kernel: Call Trace:
kernel: [] schedule+0x29/0x70
kernel: [] schedule_timeout+0x239/0x2c0
kernel: [] ? __wake_up+0x44/0x50
kernel: [] ? taskq_dispatch_ent+0x57/0x170 [spl]
kernel: [] io_schedule_timeout+0xad/0x130
kernel: [] ? prepare_to_wait_exclusive+0x56/0x90
kernel: [] io_schedule+0x18/0x20
kernel: [] cv_wait_common+0xb2/0x150 [spl]
kernel: [] ? wake_up_atomic_t+0x30/0x30
kernel: [] __cv_wait_io+0x18/0x20 [spl]
kernel: [] zio_wait+0x10b/0x1b0 [zfs]
kernel: [] zil_commit.part.12+0x4ae/0x830 [zfs]
kernel: [] zil_commit+0x17/0x20 [zfs]
kernel: [] zfs_fsync+0x77/0xf0 [zfs]
kernel: [] zpl_fsync+0x65/0x90 [zfs]
kernel: [] do_fsync+0x65/0xa0
kernel: [] SyS_fdatasync+0x13/0x20
kernel: [] system_call_fastpath+0x16/0x1b
When this situation occurs, the zfs kernel module stays unresponsive until the system is hard reset, as it even prevents a normal reboot.
Should you not be able to replace a physical device in a zpool without the whole pool freezing and needing to reset the whole system, since this is the point of having redundant storage, in this case in the form of mirrors?
(running zfsonlinux 0.7.5 on CentOS 7.4)