Return to Level1Techs.com

Troubleshoot zfs error: PANIC at zfs_znode.c:611:zfs_znode_alloc()

Hello,

Any help troubleshooting would be greatly appreciated. I have a Proxmox system that is giving a kernel panic which seems to happen randomly (doesn’t do it every time) when using a zfs send command.

  • system is running proxmox 6.0
  • pyznap to run the zfs send command
  • send command is sending between pools on the same machine
  • both pools show zeros errors on zpool status.
  • running amd 3900x on MSI meg ace x570 board

Kernel
⁠Linux version 5.0.15-1-pve ([email protected]) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.0.15-1 (Wed, 03 Jul 2019 10:51:57 +0200)

ZFS
⁠⁠zfs-0.8.2-pve2
⁠zfs-kmod-0.8.1-pve

Googling provide much help with the messages below:

dmesg gives the following:

 VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
[113512.912807] PANIC at zfs_znode.c:611:zfs_znode_alloc()
[113512.912808] Showing stack for process 1975
[113512.912811] CPU: 4 PID: 1975 Comm: z_unlinked_drai Tainted: P           O      5.0.15-1-pve #1
[113512.912813] Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 ACE (MS-7C35), BIOS 1.50 09/16/2019
[113512.912815] Call Trace:
[113512.912822]  dump_stack+0x63/0x8a
[113512.912830]  spl_dumpstack+0x29/0x2b [spl]
[113512.912833]  spl_panic+0xd3/0xfb [spl]
[113512.912837]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[113512.912873]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[113512.912901]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[113512.912903]  ? destroy_inode+0x3e/0x60
[113512.912905]  ? evict+0x139/0x1a0
[113512.912906]  ? iput+0x148/0x210
[113512.912908]  ? insert_inode_locked+0x1d8/0x1e0
[113512.912933]  zfs_znode_alloc+0x625/0x680 [zfs]
[113512.912958]  zfs_zget+0x1ad/0x240 [zfs]
[113512.912984]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[113512.912987]  ? __switch_to_asm+0x34/0x70
[113512.912989]  ? __switch_to_asm+0x40/0x70
[113512.912990]  ? __switch_to_asm+0x34/0x70
[113512.912991]  ? __switch_to_asm+0x40/0x70
[113512.912993]  ? __switch_to_asm+0x34/0x70
[113512.912996]  ? finish_task_switch+0xcd/0x270
[113512.912998]  ? __schedule+0x2dc/0x870
[113512.913000]  ? remove_wait_queue+0x4d/0x60
[113512.913004]  taskq_thread+0x2ec/0x4d0 [spl]
[113512.913007]  ? wake_up_q+0x80/0x80
[113512.913009]  kthread+0x120/0x140
[113512.913013]  ? task_done+0xb0/0xb0 [spl]
[113512.913015]  ? __kthread_parkme+0x70/0x70
[113512.913016]  ret_from_fork+0x22/0x40
[113704.620941] INFO: task z_unlinked_drai:1975 blocked for more than 120 seconds.
[113704.620947]       Tainted: P           O      5.0.15-1-pve #1
[113704.620949] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[113704.620951] z_unlinked_drai D    0  1975      2 0x80000000
[113704.620954] Call Trace:
[113704.620961]  __schedule+0x2d4/0x870
[113704.620965]  schedule+0x2c/0x70
[113704.620971]  spl_panic+0xf9/0xfb [spl]
[113704.620975]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[113704.621009]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[113704.621036]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[113704.621038]  ? destroy_inode+0x3e/0x60
[113704.621040]  ? evict+0x139/0x1a0
[113704.621041]  ? iput+0x148/0x210
[113704.621043]  ? insert_inode_locked+0x1d8/0x1e0
[113704.621068]  zfs_znode_alloc+0x625/0x680 [zfs]
[113704.621093]  zfs_zget+0x1ad/0x240 [zfs]
[113704.621119]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[113704.621122]  ? __switch_to_asm+0x34/0x70
[113704.621123]  ? __switch_to_asm+0x40/0x70
[113704.621125]  ? __switch_to_asm+0x34/0x70
[113704.621126]  ? __switch_to_asm+0x40/0x70
[113704.621127]  ? __switch_to_asm+0x34/0x70
[113704.621130]  ? finish_task_switch+0xcd/0x270
[113704.621132]  ? __schedule+0x2dc/0x870
[113704.621135]  ? remove_wait_queue+0x4d/0x60
[113704.621139]  taskq_thread+0x2ec/0x4d0 [spl]
[113704.621141]  ? wake_up_q+0x80/0x80
[113704.621143]  kthread+0x120/0x140
[113704.621147]  ? task_done+0xb0/0xb0 [spl]
[113704.621149]  ? __kthread_parkme+0x70/0x70
[113704.621150]  ret_from_fork+0x22/0x40
[113825.453664] INFO: task z_unlinked_drai:1975 blocked for more than 120 seconds.
[113825.453670]       Tainted: P           O      5.0.15-1-pve #1
[113825.453672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[113825.453674] z_unlinked_drai D    0  1975      2 0x80000000
[113825.453678] Call Trace:
[113825.453685]  __schedule+0x2d4/0x870
[113825.453689]  schedule+0x2c/0x70
[113825.453695]  spl_panic+0xf9/0xfb [spl]
[113825.453700]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[113825.453741]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[113825.453773]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[113825.453776]  ? destroy_inode+0x3e/0x60
[113825.453778]  ? evict+0x139/0x1a0
[113825.453779]  ? iput+0x148/0x210
[113825.453781]  ? insert_inode_locked+0x1d8/0x1e0
[113825.453812]  zfs_znode_alloc+0x625/0x680 [zfs]
[113825.453841]  zfs_zget+0x1ad/0x240 [zfs]
[113825.453872]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[113825.453875]  ? __switch_to_asm+0x34/0x70
[113825.453877]  ? __switch_to_asm+0x40/0x70
[113825.453878]  ? __switch_to_asm+0x34/0x70
[113825.453880]  ? __switch_to_asm+0x40/0x70
[113825.453882]  ? __switch_to_asm+0x34/0x70
[113825.453885]  ? finish_task_switch+0xcd/0x270
[113825.453887]  ? __schedule+0x2dc/0x870
[113825.453889]  ? remove_wait_queue+0x4d/0x60
[113825.453894]  taskq_thread+0x2ec/0x4d0 [spl]
[113825.453897]  ? wake_up_q+0x80/0x80
[113825.453900]  kthread+0x120/0x140
[113825.453904]  ? task_done+0xb0/0xb0 [spl]
[113825.453906]  ? __kthread_parkme+0x70/0x70
[113825.453908]  ret_from_fork+0x22/0x40
[113946.286504] INFO: task z_unlinked_drai:1975 blocked for more than 120 seconds.
[113946.286509]       Tainted: P           O      5.0.15-1-pve #1
[113946.286510] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[113946.286513] z_unlinked_drai D    0  1975      2 0x80000000
[113946.286515] Call Trace:
[113946.286521]  __schedule+0x2d4/0x870
[113946.286525]  schedule+0x2c/0x70
[113946.286531]  spl_panic+0xf9/0xfb [spl]
[113946.286535]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[113946.286569]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[113946.286596]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[113946.286599]  ? destroy_inode+0x3e/0x60
[113946.286600]  ? evict+0x139/0x1a0
[113946.286602]  ? iput+0x148/0x210
[113946.286603]  ? insert_inode_locked+0x1d8/0x1e0
[113946.286629]  zfs_znode_alloc+0x625/0x680 [zfs]
[113946.286654]  zfs_zget+0x1ad/0x240 [zfs]
[113946.286681]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[113946.286683]  ? __switch_to_asm+0x34/0x70
[113946.286685]  ? __switch_to_asm+0x40/0x70
[113946.286686]  ? __switch_to_asm+0x34/0x70
[113946.286687]  ? __switch_to_asm+0x40/0x70
[113946.286689]  ? __switch_to_asm+0x34/0x70
[113946.286691]  ? finish_task_switch+0xcd/0x270
[113946.286693]  ? __schedule+0x2dc/0x870
[113946.286695]  ? remove_wait_queue+0x4d/0x60
[113946.286700]  taskq_thread+0x2ec/0x4d0 [spl]
[113946.286702]  ? wake_up_q+0x80/0x80
[113946.286704]  kthread+0x120/0x140
[113946.286708]  ? task_done+0xb0/0xb0 [spl]
[113946.286710]  ? __kthread_parkme+0x70/0x70
[113946.286711]  ret_from_fork+0x22/0x40
[114067.119208] INFO: task z_unlinked_drai:1975 blocked for more than 120 seconds.
[114067.119213]       Tainted: P           O      5.0.15-1-pve #1
[114067.119214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[114067.119216] z_unlinked_drai D    0  1975      2 0x80000000
[114067.119219] Call Trace:
[114067.119226]  __schedule+0x2d4/0x870
[114067.119229]  schedule+0x2c/0x70
[114067.119235]  spl_panic+0xf9/0xfb [spl]
[114067.119240]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[114067.119274]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[114067.119301]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[114067.119304]  ? destroy_inode+0x3e/0x60
[114067.119305]  ? evict+0x139/0x1a0
[114067.119307]  ? iput+0x148/0x210
[114067.119308]  ? insert_inode_locked+0x1d8/0x1e0
[114067.119334]  zfs_znode_alloc+0x625/0x680 [zfs]
[114067.119359]  zfs_zget+0x1ad/0x240 [zfs]
[114067.119385]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[114067.119387]  ? __switch_to_asm+0x34/0x70
[114067.119389]  ? __switch_to_asm+0x40/0x70
[114067.119390]  ? __switch_to_asm+0x34/0x70
[114067.119391]  ? __switch_to_asm+0x40/0x70
[114067.119393]  ? __switch_to_asm+0x34/0x70
[114067.119395]  ? finish_task_switch+0xcd/0x270
[114067.119397]  ? __schedule+0x2dc/0x870
[114067.119399]  ? remove_wait_queue+0x4d/0x60
[114067.119404]  taskq_thread+0x2ec/0x4d0 [spl]
[114067.119406]  ? wake_up_q+0x80/0x80
[114067.119408]  kthread+0x120/0x140
[114067.119412]  ? task_done+0xb0/0xb0 [spl]
[114067.119413]  ? __kthread_parkme+0x70/0x70
[114067.119415]  ret_from_fork+0x22/0x40
[114110.523937]  zd96: p1
[114187.952077] INFO: task z_unlinked_drai:1975 blocked for more than 120 seconds.
[114187.952083]       Tainted: P           O      5.0.15-1-pve #1
[114187.952084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[114187.952086] z_unlinked_drai D    0  1975      2 0x80000000
[114187.952089] Call Trace:
[114187.952096]  __schedule+0x2d4/0x870
[114187.952099]  schedule+0x2c/0x70
[114187.952106]  spl_panic+0xf9/0xfb [spl]
[114187.952110]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
[114187.952146]  ? zfs_inode_destroy+0xf8/0x110 [zfs]
[114187.952174]  ? zpl_inode_destroy+0xe/0x10 [zfs]
[114187.952177]  ? destroy_inode+0x3e/0x60
[114187.952178]  ? evict+0x139/0x1a0
[114187.952179]  ? iput+0x148/0x210
[114187.952181]  ? insert_inode_locked+0x1d8/0x1e0
[114187.952207]  zfs_znode_alloc+0x625/0x680 [zfs]
[114187.952233]  zfs_zget+0x1ad/0x240 [zfs]
[114187.952260]  zfs_unlinked_drain_task+0x74/0x100 [zfs]
[114187.952262]  ? __switch_to_asm+0x34/0x70
[114187.952264]  ? __switch_to_asm+0x40/0x70
[114187.952265]  ? __switch_to_asm+0x34/0x70
[114187.952267]  ? __switch_to_asm+0x40/0x70

syslog gives the following:

Dec 20 01:00:03 pve kernel: [113512.912807] PANIC at zfs_znode.c:611:zfs_znode_alloc()
Dec 20 01:00:03 pve kernel: [113512.912811] CPU: 4 PID: 1975 Comm: z_unlinked_drai Tainted: P           O      5.0.15-1-pve #1
Dec 20 01:00:03 pve kernel: [113512.912815] Call Trace:
Dec 20 01:00:03 pve kernel: [113512.912830]  spl_dumpstack+0x29/0x2b [spl]
Dec 20 01:00:03 pve kernel: [113512.912837]  ? spl_kmem_cache_free+0xc0/0x1d0 [spl]
Dec 20 01:00:03 pve kernel: [113512.912901]  ? zpl_inode_destroy+0xe/0x10 [zfs]
Dec 20 01:00:03 pve kernel: [113512.912905]  ? evict+0x139/0x1a0
Dec 20 01:00:03 pve kernel: [113512.912908]  ? insert_inode_locked+0x1d8/0x1e0
Dec 20 01:00:03 pve kernel: [113512.912958]  zfs_zget+0x1ad/0x240 [zfs]
Dec 20 01:00:03 pve kernel: [113512.912987]  ? __switch_to_asm+0x34/0x70
Dec 20 01:00:03 pve kernel: [113512.912990]  ? __switch_to_asm+0x34/0x70
Dec 20 01:00:03 pve kernel: [113512.912991]  ? __switch_to_asm+0x40/0x70
Dec 20 01:00:03 pve kernel: [113512.912993]  ? __switch_to_asm+0x34/0x70
Dec 20 01:00:03 pve kernel: [113512.912996]  ? finish_task_switch+0xcd/0x270
Dec 20 01:00:03 pve kernel: [113512.912998]  ? __schedule+0x2dc/0x870
Dec 20 01:00:03 pve kernel: [113512.913000]  ? remove_wait_queue+0x4d/0x60
Dec 20 01:00:03 pve kernel: [113512.913004]  taskq_thread+0x2ec/0x4d0 [spl]
Dec 20 01:00:03 pve kernel: [113512.913007]  ? wake_up_q+0x80/0x80
Dec 20 01:00:03 pve kernel: [113512.913009]  kthread+0x120/0x140
Dec 20 01:00:03 pve kernel: [113512.913013]  ? task_done+0xb0/0xb0 [spl]
Dec 20 01:00:03 pve kernel: [113512.913015]  ? __kthread_parkme+0x70/0x70
Dec 20 01:00:03 pve kernel: [113512.913016]  ret_from_fork+0x22/0x40

I would raise a bug report with ZOL Proxmox (Edit: as it’s not offically supported, try proxmox first)
Does it always do it eventually when sending, or only sometimes?

It only does it sometimes.
Thanks, I’ll submit a bug report. Not quite sure how to make it reproducible though.

yeah.
If it was always / common / reproducable, would poke at a few things, see if maybe could tweak, but otherwise, see if Proxmox will take a bug report- they might have an answer, or might refer you up to ZOL.

They might not have any answers, but having the info would help them in case others have the issue, so the more info they have, the better…