ZFS Kernel Panic on heavy usage

I have 2 zfs pools on an unraid 6.8.3 server. I have a docker image for nzbget and when it downloads (it averages around 800 Mbps) and unpacks at the same time, I get a kernel panic and the pool doesnt respond anymore. below is an example output of the logs. this zfs pool has 2 raidz arrays of 4 2TB disks each (WD Red). In my research it seems to be occuring because there is too much throughput? is there something I can do to fix it, or should I move off of zfs for stability?

Apr 2 01:11:48 Tower kernel: PANIC: zfs: accessing past end of object e26/543cf (size=6656 access=6308+1033)
Apr 2 01:11:48 Tower kernel: Showing stack for process 25214
Apr 2 01:11:48 Tower kernel: CPU: 2 PID: 25214 Comm: nzbget Tainted: P O 4.19.107-Unraid #1
Apr 2 01:11:48 Tower kernel: Hardware name: ASUSTeK COMPUTER INC. Z9PE-D16 Series/Z9PE-D16 Series, BIOS 5601 06/11/2015
Apr 2 01:11:48 Tower kernel: Call Trace:
Apr 2 01:11:48 Tower kernel: dump_stack+0x67/0x83
Apr 2 01:11:48 Tower kernel: vcmn_err+0x8b/0xd4 [spl]
Apr 2 01:11:48 Tower kernel: ? spl_kmem_alloc+0xc9/0xfa [spl]
Apr 2 01:11:48 Tower kernel: ? _cond_resched+0x1b/0x1e
Apr 2 01:11:48 Tower kernel: ? mutex_lock+0xa/0x25
Apr 2 01:11:48 Tower kernel: ? dbuf_find+0x130/0x14c [zfs]
Apr 2 01:11:48 Tower kernel: ? _cond_resched+0x1b/0x1e
Apr 2 01:11:48 Tower kernel: ? mutex_lock+0xa/0x25
Apr 2 01:11:48 Tower kernel: ? arc_buf_access+0x69/0x1f4 [zfs]
Apr 2 01:11:48 Tower kernel: ? _cond_resched+0x1b/0x1e
Apr 2 01:11:48 Tower kernel: zfs_panic_recover+0x67/0x7e [zfs]
Apr 2 01:11:48 Tower kernel: ? spl_kmem_zalloc+0xd4/0x107 [spl]
Apr 2 01:11:48 Tower kernel: dmu_buf_hold_array_by_dnode+0x92/0x3b6 [zfs]
Apr 2 01:11:48 Tower kernel: dmu_write_uio_dnode+0x46/0x11d [zfs]
Apr 2 01:11:48 Tower kernel: ? txg_rele_to_quiesce+0x24/0x32 [zfs]
Apr 2 01:11:48 Tower kernel: dmu_write_uio_dbuf+0x48/0x5e [zfs]
Apr 2 01:11:48 Tower kernel: zfs_write+0x6a3/0xbe8 [zfs]
Apr 2 01:11:48 Tower kernel: zpl_write_common_iovec+0xae/0xef [zfs]
Apr 2 01:11:48 Tower kernel: zpl_iter_write+0xdc/0x10d [zfs]
Apr 2 01:11:48 Tower kernel: do_iter_readv_writev+0x110/0x146
Apr 2 01:11:48 Tower kernel: do_iter_write+0x86/0x15c
Apr 2 01:11:48 Tower kernel: vfs_writev+0x90/0xe2
Apr 2 01:11:48 Tower kernel: ? list_lru_add+0x63/0x13a
Apr 2 01:11:48 Tower kernel: ? vfs_ioctl+0x19/0x26
Apr 2 01:11:48 Tower kernel: ? do_vfs_ioctl+0x533/0x55d
Apr 2 01:11:48 Tower kernel: ? syscall_trace_enter+0x163/0x1aa
Apr 2 01:11:48 Tower kernel: do_writev+0x6b/0xe2
Apr 2 01:11:48 Tower kernel: do_syscall_64+0x57/0xf2
Apr 2 01:11:48 Tower kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 2 01:11:48 Tower kernel: RIP: 0033:0x14c478acbf90
Apr 2 01:11:48 Tower kernel: Code: 89 74 24 10 48 89 e5 48 89 04 24 49 29 c6 48 89 54 24 18 4c 89 74 24 08 49 01 d6 48 63 7b 78 49 63 d7 4c 89 e8 48 89 ee 0f 05 <48> 89 c7 e8 1b 85 fd ff 49 39 c6 75 19 48 8b 43 58 48 8b 53 60 48
Apr 2 01:11:48 Tower kernel: RSP: 002b:000014c478347640 EFLAGS: 00000216 ORIG_RAX: 0000000000000014
Apr 2 01:11:48 Tower kernel: RAX: ffffffffffffffda RBX: 0000558040d4e920 RCX: 000014c478acbf90
Apr 2 01:11:48 Tower kernel: RDX: 0000000000000002 RSI: 000014c478347640 RDI: 0000000000000005
Apr 2 01:11:48 Tower kernel: RBP: 000014c478347640 R08: 0000000000000001 R09: 000014c478b15873
Apr 2 01:11:48 Tower kernel: R10: 0000000000000006 R11: 0000000000000216 R12: 000000000000000b
Apr 2 01:11:48 Tower kernel: R13: 0000000000000014 R14: 0000000000000409 R15: 0000000000000002

1 Like

how does it compare to the ā€œPanic: zfs: accessing past end of objectā€ reports on the ZOL github issue tracker?
might help them to post a bug report, if it is consistent?
Look slike it has been intermittent with non-ECC memory, and has been noticed on and off in the past?

there were a couple of things you might try,
'zfs_recover = 1

as a temporary cludge to keep pool up while having errors

apart from that, just ensure the zfs app on the box is upgraded (not the pool, the OS tools)

This looks like it addresses the issue. I could test if I knew how to update zfs plugin on unraidā€¦

1 Like

looks like the author of the plug-in does update it often, maybe leave a comment on his thread he has open, explain the issue you have, the PR you found, and see if he can/will update, or suggest different course/ tip in the mean time?

Posted maybe an hour ago from now:

https://forums.unraid.net/topic/41333-zfs-plugin-for-unraid/page/14/

I originally posted the issue in that thread (page 13) I just posted again with the PR. thanks for your help!

1 Like

Nice, I see now, and hereā€™s hoping perhaps the guy can help.
Would you post back here if it is fixed? for posterity?

Didnā€™t work. Still got the panic. Will be posting to their GitHub